A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
نویسندگان
چکیده
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during whole process, and future deviations are to their present values. yields a function dependent on mean, this dependency renders traditional dynamic programming methods inapplicable since it suppresses crucial property—time-consistency. To deal with unorthodox problem, we introduce pseudo mean transform untreatable MDP standard one redefined form derive performance difference formula. With propose unified algorithm framework bilevel structure for optimization. unifies variety of algorithms several variance-related problems including, but not limited to, optimizations average MDPs. Furthermore, convergence analyses missing from literature can be complemented proposed as well. Taking value iteration an example, develop prove its local optimum aid Bellman local-optimality equation. Finally, conduct numerical experiment portfolio management validate algorithm.
منابع مشابه
Mean-Variance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for oth...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملAlgorithmic aspects of mean-variance optimization in Markov decision processes
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for oth...
متن کاملSimplex Algorithm for Countable-State Discounted Markov Decision Processes
We consider discounted Markov Decision Processes (MDPs) with countably-infinite statespaces, finite action spaces, and unbounded rewards. Typical examples of such MDPs areinventory management and queueing control problems in which there is no specific limit on thesize of inventory or queue. Existing solution methods obtain a sequence of policies that convergesto optimality i...
متن کاملRisk-Sensitive and Mean Variance Optimality in Markov Decision Processes
In this note, we compare two approaches for handling risk-variability features arising in discrete-time Markov decision processes: models with exponential utility functions and mean variance optimality models. Computational approaches for finding optimal decision with respect to the optimality criteria mentioned above are presented and analytical results showing connections between the above op...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: European Journal of Operational Research
سال: 2023
ISSN: ['1872-6860', '0377-2217']
DOI: https://doi.org/10.1016/j.ejor.2023.06.022